FIG - loading a GIF file
Introduction
This article briefly describes the GIF file format, how the LZW decompression works and gives an overview of the supplied 80x86 source-code. Yep, for all you 80x86 people out there... here is finally some code to load a GIF!!
For such a widely used graphics file format the GIF it is still a mysterious Pandora's box of CompuServe Patents and lack of 80x86 friendly info. I see the question "How do I load a GIF?" very often in the various assembly newsgroups, so hopefully this will address the problem.
The ZIP files
No, not staring Gillian Anderson... The ZIP file contains source code for 32-bit Flat/P-mode. I've tried to keep it all simple and give plenty of comments were possible. The code was written for TASM and tested on both TASM 5.0 and MASM 6.11 (sorry, no N-ASM version).
GIF32.ASM - 32-bit source code for Flat/pmode
GIF.INC - include file (contains GIF structures & equates)
TINYHUGI.GIF - 640x480x16bit piccy I drew resized to 160x100x8
I used the nice WDOSX dos extender by Michael Tippach for the "GIF32.ASM" 32-bit version of the GIF loader, you can find this at:
http://www.wuschel.demon.co.uk/index.html
You MAY need the DLINK.EXE program by Adam Seychell (depending on your TASM and possibly MASM version). This can be found in the DOS32B35.ZIP at:
http://www.programmersheaven.com/zone5/cat19/index.htm
The GIF file format
The supplied code was written to only load and decompress a single image from a GIF file, so it does NOT deal with all the other extension sub-blocks (like comments, graphic-control data, plain-text or application data). So for animated GIFs you will need to adapt the source code to suit your own needs.
This shouldn't be as bad as it first sounds. The code already skips past these sub-block extensions and does the core work of decompressing the LZW bitstream (which is THE hardest part, IMHO).
A basic GIF87a/GIF89a file looks something like this:
gifhdr - the GIF header (Date/Version & "FIG" signature ;)
giflsd - the LSD (Logical Screen Descriptor)
- the GCT (Global Color Table, ie. the palette)
gifxxx - the various extension blocks (AEX, CEX, PTE etc..)
giftid - the TID (The Image Descriptor)
- the LCT (Local Color Table, ie. one image's palette)
- the TBI (Table Based Image - LZW bitstream)
As far as I know the extensions and TID/LCT/TBI sections can be freely repeated for GIF files with multiple images. The extensions supply timing information and a method to intergrate text information between images.
A loader overview
The GIF32 works like this:
1) Open the file
2) Read the gifhdr (header) and check the "G" "I" "F" signature)
3) Read the LSD (Logical Screen Descriptor) (you may wish to extract this screen size information from here so you can allocate/select the correct screen size & resolution)
4) Read the GCT (Global Color Table) if any (the GCT_FLAG in the [giflsd.LsdFlags] indicates if GCT is present)
5) Read the next byte and check for extension OR TID signature (02C hex) (you can skip the extension sub-blocks by sitting in a loop and looking for a BlockSize = 0, which marks the end of a section)
6) Read the TID (The Image Descriptor) (this begins with the 02C hex Separator and contains info about the LZW compressed image which follows any optional LCT)
7) Read the LCT (Local Color Table) (this is a local palette used for the next image(s) only, its in the same format as the GCT which is 8-bits per R,G,B component. The LCT_FLAG in the [giftid.TidFlags] denotes if a LCT is present.)
8) Read the LZW code size, then decompress the TBI data sub-blocks. (The first byte, the ImageBits code size, is used to init the LZW decompression code token values. The actual LZW bitstream is broken into sub-blocks of between 1 and 255 bytes, just like the extension sections. Each block is preceeded by a byte count, a 00 means the end of the bitstream.)
The LZW algorithm
Sorry, but I'll skip most of the explaination about how the LZW works (there is plenty of information available on the net), besides you already have some 80x86 source-code. ;)
Oh, okay... here is a rough decompressor..
ImageBits = 2 ... 8 ; This is the minimum LZW code size
; (i.e. number of bits in image pixels)
phraze[4097] bytes ; used to build up a reversed LZW phraze
suffix[4097] bytes ; last byte of each phraze
prefix[4097] words ; token of the sub-phraze before the suffix
Init:
FutureCode = 0
OldCode = 0
BytesLeft = 0
CodeBits = ImageBits + 1
TopSlot = 1 << CodeBits
FlushCode = 1 << ImageBits
EndCode = FlushCode + 1
Slot = FlushCode + 2
BytesLeft = 0
BitPos = 8 ; any number > 7 !!
Depack:
Token = ReadGIFCode() ; get the next LZW token from the
; bitstream.
if Token = EndCode then done.. ; end of image ?
if Token = FlushCode then ...
CodeBits = ImageBits + 1
TopSlot = 1 << CodeBits
Slot = EndCode + 1
while Token = FlushCode
Token = GetGIFCode()
wend
if Token = EndCode then done ...
if Token >= Slot then Token = 0
OldCode = Token
FutureCode = Token
OutputByte(Token)
else..
Code = Token
Rev = 4097
if Code >= Slot then ...
OldCode = Code
FutureCode = Code
Rev - 1
phraze[Rev] = Code
while Code > EndCode ...
Rev - 1
phraze[Rev] = suffix[Code]
Code = prefix[code]
wend
Rev - 1
phraze[Rev] = Code
if Slot < TopSlot then ...
FutureCode = Code
suffix[Slot] = Code
prefix[Slot] = OldCode
OldCode = Code
Slot + 1
if Slot >= TopSlot and CodeBits < 12 then ...
TopSlot << 1
CodeBits + 1
while Rev < 4097
OutputByte( phraze[Rev] )
Rev + 1
wend
goto Depack
That's the main decompression, the only thing missing is the "ReadGIFCode" routine which constructs a N-bit token code from the TBI bitstream.
The GIF Bitstream
The bitstream is stored in an Intel friendly low, high-byte style with bits stored in a bit-0 first, bit-7 last order. The actual bytes which make up the entire bitsteam is broken up into smaller blocks, each block contains between 1 and 255 bytes. Each block is preceeded by a length count, exactly like the other extension blocks. A length count = 0 means the end of the bitstream.
byte 1 byte 2 byte 3
76543210 76543210 76543210
hgfedcba ponmlkji --vutsrq <-- 22 bits abcdefghijklmnopqrstuv
The above 22 bits (3 bytes) could be stored in a block like this:
76543210
byte 0 00000011 = 3 the length byte
byte 1 hgfedcba bits 0...7
byte 2 ponmlkji bits 8...15
byte 3 --vutsrq bits 16...21
A LZW token consists of between 3 and 12 bits (depending on both the number of ImageBits and the number of phrazes stored in the dictionary). A "GetBits" routine has to deal with both constructing a token code from a number of bitstream bytes and also dealing with crossing the TBI block boundaries.
ReadGIFCode:
if BitPos > 7 then ...
BitRack = ReadGIFByte()
BitPos = 0
Token = BitRack >> BitPos
GotBits = -BitPos + 8
while GotBits < CodeBits
BitRack = ReadGIFByte()
Token = Token or (BitRack << GotBits)
GotBits + 8
wend
BitPos = 8 - (GotBits - CodeBits)
Token = Token and (TopSlot - 1)
The following code will read a bitstream byte and deals with the caching of a single TBI block (which is between 1 and 255 bytes in length).
ReadGIFByte:
if BytesLeft = 0 then ...
BytesLeft = ReadFileByte()
for n = 1 to BytesLeft
cache[n-1] = ReadFileByte()
next n
BytePos = 0
n = cache[BytePos]
BytePos + 1
return (n)
Closing words
I think I've said enough here. You know where the source code is and you can always fire up your favourite search engine if you still need some more info about the LZW algorithm or the GIF87a/GIF89a specs...
Happy viewing...